AITopics | Grammars & Parsing

Collaborating Authors

Grammars & Parsing

News Overviews Instructional Materials AI-Alerts Classics

Learnability Matters: Active Learning for Video Captioning

Neural Information Processing SystemsMay-29-2025, 07:43:19 GMT

This work focuses on the active learning in video captioning. In particular, we propose to address the learnability problem in active learning, which has been brought up by collective outliers in video captioning and neglected in the literature. To start with, we conduct a comprehensive study of collective outliers, exploring their hard-to-learn property and concluding that ground truth inconsistency is one of the main causes. Motivated by this, we design a novel active learning algorithm that takes three complementary aspects, namely learnability, diversity, and uncertainty, into account. Ideally, learnability is reflected by ground truth consistency. Under the active learning scenario where ground truths are not available until human involvement, we measure the consistency on estimated ground truths, where predictions from off-the-shelf models are utilized as approximations to ground truths. These predictions are further used to estimate sample frequency and reliability, evincing the diversity and uncertainty respectively. With the help of our novel caption-wise active learning protocol, our algorithm is capable of leveraging knowledge from humans in a more effective yet intellectual manner. Results on publicly available video captioning datasets with diverse video captioning models demonstrate that our algorithm outperforms SOTA active learning methods by a large margin,e.g.we achieve about 103% of full performance on CIDEr with 25% of human annotations on MSR-VTT.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
Asia > China (0.67)
North America > United States > New York (0.14)
(4 more...)

Genre:

Research Report > Experimental Study (0.93)
Overview (0.67)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)

Add feedback

4d7e0d72898ae7ea3593eb5ebf20c744-Paper.pdf

Neural Information Processing SystemsMay-29-2025, 06:57:59 GMT

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country:

Asia > China (0.14)
North America > United States (0.14)
North America > Canada (0.14)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

A Simulating

Neural Information Processing SystemsMay-29-2025, 05:58:50 GMT

Below is a FGG for derivations of a PCFG in Chomsky normal form. The largest right-hand side has 3 variables, so k = 2. The variables range over nonterminals, so m = |N| where N is the CFG's nonterminal alphabet. B.1 Plate diagrams Plate diagrams are extensions of graphs that describe repeated structure in Bayesian networks (Buntine, 1994) or factor graphs (Obermeyer et al., 2019). A plate is a subset of variables/factors, together with a count M, indicating that the variables/factors inside the plate are to be replicated M times. But there cannot be edges between different instances of a plate.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.48)

Add feedback

Factor Graph Grammars

Neural Information Processing SystemsMay-29-2025, 05:58:42 GMT

We propose the use of hyperedge replacement graph grammars for factor graphs, or factor graph grammars (FGGs) for short. FGGs generate sets of factor graphs and can describe a more general class of models than plate notation, dynamic graphical models, case-factor diagrams, and sum-product networks can. Moreover, inference can be done on FGGs without enumerating all the generated factor graphs. For finite variable domains (but possibly infinite sets of graphs), a generalization of variable elimination to FGGs allows exact and tractable inference in many situations. For finite sets of graphs (but possibly infinite variable domains), a FGG can be converted to a single factor graph amenable to standard inference techniques.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
North America > Canada (0.14)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

Appendix: Structured Reordering for Modeling Latent Alignments in Sequence Transduction ] for each segment, which is the total weight of all derivations with root X

Neural Information Processing SystemsMay-29-2025, 04:11:54 GMT

WCFG to PCFG Conversion The algorithm of converting a WCFG to its equivalent PCFG is shown in Algorithm 1. Full proof of this equivalence can be found in Smith and Johnson [1]. Proof of the Dynamic Programming for Marginal Inference We prove the correctness of the dynamic programming algorithm for computing the marginal permutation matrix of separable permutations by induction as follows. As a base case, each word (i.e., segment with length 1) is associated with an identity permutation matrix 1. Architecture and Hyperparameters The detailed architecture of ReMoto is shown in Figure 1. Figure 1: The detailed architecture of our seq2seq model for semantic parsing (view in color). First, the structured reordering module genearates a (relaxed) permutation matrix given the input utterrance. Then, the encoding module generates the representations of the input utterance based on the reordered embeddings, which are computed based on the original embedding and the permutation matrix computed in the first step.

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.62)

Add feedback

Language Through a Prism: A Spectral Approach for Multiscale Language Representations

Neural Information Processing SystemsMay-29-2025, 01:21:42 GMT

Language exhibits structure at different scales, ranging from subwords to words, sentences, paragraphs, and documents. To what extent do deep models capture information at these scales, and can we force them to better capture structure across this hierarchy? We approach this question by focusing on individual neurons, analyzing the behavior of their activations at different timescales. We show that signal processing provides a natural framework for separating structure across scales, enabling us to 1) disentangle scale-specific information in existing embeddings and 2) train models to learn more about particular scales. Concretely, we apply spectral filters to the activations of a neuron across an input, producing filtered embeddings that perform well on part of speech tagging (word-level), dialog speech acts classification (utterance-level), or topic classification (document-level), while performing poorly on the other tasks. We also present a prism layer for training models, which uses spectral filters to constrain different neurons to model structure at different scales. Our proposed BERT + Prism model can better predict masked tokens using long-range context and produces multiscale representations that perform better at utterance-and document-level tasks. Our methods are general and readily applicable to other domains besides language, such as images, audio, and video.

machine learning, natural language, text classification, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Colorado > Boulder County > Boulder (0.14)
North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.89)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.67)

Add feedback

3acb2a202ae4bea8840224e6fce16fd0-Paper.pdf

Neural Information Processing SystemsMay-29-2025, 01:21:35 GMT

machine learning, natural language, text classification, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Colorado > Boulder County > Boulder (0.14)
North America > United States > California > Santa Clara County (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.67)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.67)

Add feedback

5f93f983524def3dca464469d2cf9f3e-Paper.pdf

Neural Information Processing SystemsMay-29-2025, 00:19:08 GMT

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.68)

Industry:

Media (0.47)
Leisure & Entertainment (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Parameterizing Context: Unleashing the Power of Parameter-Efficient Fine-Tuning and In-Context Tuning for Continual Table Semantic Parsing Yongrui Chen 1,2, Guilin Qi

Neural Information Processing SystemsMay-28-2025, 22:18:11 GMT

Continual table semantic parsing aims to train a parser on a sequence of tasks, where each task requires the parser to translate natural language into SQL based on taskspecific tables but only offers limited training examples. Conventional methods tend to suffer from overfitting with limited supervision, as well as catastrophic forgetting due to parameter updates. Despite recent advancements that partially alleviate these issues through semi-supervised data augmentation and retention of a few past examples, the performance is still limited by the volume of unsupervised data and stored examples. To overcome these challenges, this paper introduces a novel method integrating parameter-efficient fine-tuning (PEFT) and in-context tuning (ICT) for training a continual table semantic parser. Initially, we present a task-adaptive PEFT framework capable of fully circumventing catastrophic forgetting, which is achieved by freezing the pre-trained model backbone and fine-tuning small-scale prompts. Building on this, we propose a teacher-student framework-based solution. The teacher addresses the few-shot problem using ICT, which procures contextual information by demonstrating a few training examples. In turn, the student leverages the proposed PEFT framework to learn from the teacher's output distribution, then compresses and saves the contextual information to the prompts subsequently, eliminating the need to store any training examples.

computational linguistic, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Austria > Vienna (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology (0.68)
Education (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

MOMA-LRG: Language-Refined Graphs for Multi-Object Multi-Actor Activity Parsing Supplementary Material

Neural Information Processing SystemsMay-28-2025, 21:52:26 GMT

VLM Evaluation To evaluate two VLMs (Frozen in Time [1] and VideoCLIP [13]), we use a hybrid approach that leverages both prototypical networks [11] and the video-language similarity metrics learned by both models. Below, we show an ablation study where we use only the video prototype networks. We show the performance of using only language similarity in the few-shot case to demonstrate the effects of sample removal, and we also show the effects of our hybrid weighting scheme, where we weight the language embeddings five times more than the video embeddings when constructing the hybrid prototype (as opposed to equal weighting during the regular hybrid approach). Although we perform our ablation study with Frozen-in-Time, and use the same weighting scheme and prototype strategy for VideoCLIP as well. For this study, we show activity and sub-activity classification accuracy in the 5-shot case. We visualize whether a given method uses language, video, or both to create its prototype embeddings.

artificial intelligence, machine learning, natural language, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.91)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.40)

Add feedback